Skip to content

feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor…#711

Merged
michalharakal merged 2 commits into
developfrom
feature/708-native-quant-kernels
Jun 8, 2026
Merged

feat(backend): packed-quant matmul dispatch in DefaultCpuOpsBase (wor…#711
michalharakal merged 2 commits into
developfrom
feature/708-native-quant-kernels

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

…ks on Native)

Part of #708. Makes ops.matmul(x, ops.transpose(W)) route packed-quant weights to a kernel on EVERY KMP target. Before this, the packed-quant matmul dispatch + lazy transpose lived only in DefaultCpuOpsJvm, so on Kotlin/Native/JS/WASM a packed weight fell through to matmulGeneric, which throws on Byte-packed data — packed matmul was effectively broken off-JVM.

  • New Q5_1/Q5_0 packed tensor-data types + TensorEncoding.Q5_0/Q5_1 (lang-core).
  • DefaultCpuOpsBase: chooseQuantizedMatmulHeap resolves the kernel via the commonMain KernelRegistry (scalar floor on Native/JS/WASM; Panama/FFM on JVM via the ensureKernelProviders() hook + ServiceLoader) and dispatches FP32 × packed {Q8_0,Q4_0,Q4_K,Q6_K,Q5_1,Q5_0}; lazy-transpose shape-swap branches for the four heap K/Q5 types. The JVM ops keep their MemSeg/SIMD fast paths and intercept Q4_K/Q6_K/Q8_0/Q4_0 before the base — zero JVM regression by construction; Q5_1/Q5_0 (and the whole set on non-JVM) resolve in the base.
  • Non-JVM platform factories (linux/apple/js/wasm/wasmWasi/android) register ScalarKernelProvider (no ServiceLoader off-JVM).

Tests: PackedMatmulDispatchTest (commonTest) runs Q4_K + Q5_1 through ctx.ops.matmul(x, transpose(W)) and matches the dequant reference — green on jvmTest AND linuxX64Test (the Native end-to-end proof). Full backend-cpu jvmTest suite passes (no regression); apiDump regenerated for lang-core + backend-cpu.

michalharakal and others added 2 commits June 8, 2026 11:07
…ks on Native)

Part of #708. Makes `ops.matmul(x, ops.transpose(W))` route packed-quant weights
to a kernel on EVERY KMP target. Before this, the packed-quant matmul dispatch +
lazy transpose lived only in DefaultCpuOpsJvm, so on Kotlin/Native/JS/WASM a packed
weight fell through to matmulGeneric, which throws on Byte-packed data — packed
matmul was effectively broken off-JVM.

- New Q5_1/Q5_0 packed tensor-data types + TensorEncoding.Q5_0/Q5_1 (lang-core).
- DefaultCpuOpsBase: `chooseQuantizedMatmulHeap` resolves the kernel via the
  commonMain KernelRegistry (scalar floor on Native/JS/WASM; Panama/FFM on JVM via
  the ensureKernelProviders() hook + ServiceLoader) and dispatches FP32 × packed
  {Q8_0,Q4_0,Q4_K,Q6_K,Q5_1,Q5_0}; lazy-transpose shape-swap branches for the four
  heap K/Q5 types. The JVM ops keep their MemSeg/SIMD fast paths and intercept
  Q4_K/Q6_K/Q8_0/Q4_0 before the base — zero JVM regression by construction; Q5_1/Q5_0
  (and the whole set on non-JVM) resolve in the base.
- Non-JVM platform factories (linux/apple/js/wasm/wasmWasi/android) register
  ScalarKernelProvider (no ServiceLoader off-JVM).

Tests: PackedMatmulDispatchTest (commonTest) runs Q4_K + Q5_1 through
ctx.ops.matmul(x, transpose(W)) and matches the dequant reference — green on
jvmTest AND linuxX64Test (the Native end-to-end proof). Full backend-cpu jvmTest
suite passes (no regression); apiDump regenerated for lang-core + backend-cpu.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-quant-kernels

# Conflicts:
#	skainet-lang/skainet-lang-core/src/commonMain/kotlin/sk/ainet/lang/tensor/data/Q5_0TensorData.kt
#	skainet-lang/skainet-lang-core/src/commonMain/kotlin/sk/ainet/lang/tensor/data/Q5_1TensorData.kt
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

📖 Documentation Preview

The documentation has been built successfully for this PR.

Generated Files:

  • Operator documentation: docs/modules/operators/_generated_/
  • JSON schema output: operators.json

Artifacts:

  • Download the documentation-preview-711 artifact to view the complete documentation locally.

This comment will be updated automatically when the PR is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant